ng(hostPort).hasPort, message)
}
On Wed, Oct 14, 2015 at 2:40 PM, Thomas Dudziak wrote:
> It looks like Spark 1.5.1 does not work with IPv6. When
> adding -Djava.net.preferIPv6Addresses=true on my dual stack server, the
> driver fails with:
>
> 15/10/14 14:36:01 ERROR SparkConte
It looks like Spark 1.5.1 does not work with IPv6. When
adding -Djava.net.preferIPv6Addresses=true on my dual stack server, the
driver fails with:
15/10/14 14:36:01 ERROR SparkContext: Error initializing SparkContext.
java.lang.AssertionError: assertion failed: Expected hostname
at scala.Predef$.a
http://yahoohadoop.tumblr.com/post/129872361846/large-scale-distributed-deep-learning-on-hadoop
I would be curious to learn what the Spark developer's plans are in this
area (NNs, GPUs) and what they think of integration with existing NN
frameworks like Caffe or Torch.
cheers,
Tom
I want to use t-digest with foreachPartition and accumulators (essentially,
create a t-digest per partition and add that to the accumulator leveraging
the fact that t-digests can be added to each other). I can make t-digests
kryo-serializable easily but java-serializable is not very easy.
Now, when
aking it slower. SMJ performance is probably 5x - 1000x better in
> 1.5 for your case.
>
>
> On Thu, Aug 27, 2015 at 6:03 PM, Thomas Dudziak wrote:
>
>> I'm getting errors like "Removing executor with no recent heartbeats" &
>> "Missing an output lo
imilar problems to this (reduce side failures for large joins (25bn
> rows with 9bn)), and found the answer was to further up the
> spark.sql.shuffle.partitions=1000. In my case, 16k partitions worked for
> me, but your tables look a little denser, so you may want to go even higher.
>
> On Thu,
he answer was to further up the
> spark.sql.shuffle.partitions=1000. In my case, 16k partitions worked for
> me, but your tables look a little denser, so you may want to go even higher.
>
> On Thu, Aug 27, 2015 at 6:04 PM Thomas Dudziak wrote:
>
>> I'm getting err
I'm getting errors like "Removing executor with no recent heartbeats" &
"Missing an output location for shuffle" errors for a large SparkSql join
(1bn rows/2.5TB joined with 1bn rows/30GB) and I'm not sure how to
configure the job to avoid them.
The initial stage completes fine with some 30k tasks
:
>
> Have you tried tablesample? You find the exact syntax in the
> documentation, but it exlxactly does what you want
>
> Le mer. 26 août 2015 à 18:12, Thomas Dudziak a écrit :
>
>> Sorry, I meant without reading from all splits. This is a single
>> partition in the tab
Sorry, I meant without reading from all splits. This is a single partition
in the table.
On Wed, Aug 26, 2015 at 8:53 AM, Thomas Dudziak wrote:
> I have a sizeable table (2.5T, 1b rows) that I want to get ~100m rows from
> and I don't particularly care which rows. Doing a LIMIT un
I have a sizeable table (2.5T, 1b rows) that I want to get ~100m rows from
and I don't particularly care which rows. Doing a LIMIT unfortunately
results in two stages where the first stage reads the whole table, and the
second then performs the limit with a single worker, which is not very
efficien
Under certain circumstances that I haven't yet been able to isolate, I get
the following error when doing a HQL query using HiveContext (Spark 1.3.1
on Mesos, fine-grained mode). Is this a known problem or should I file a
JIRA for it ?
org.apache.spark.SparkException: Can only zip RDDs with same
grained scheduler, there is a spark.cores.max config setting that
> will limit the total # of cores it grabs. This was there in earlier
> versions too.
>
> Matei
>
> > On May 19, 2015, at 12:39 PM, Thomas Dudziak wrote:
> >
> > I read the other day that there will b
I read the other day that there will be a fair number of improvements in
1.4 for Mesos. Could I ask for one more (if it isn't already in there): a
configurable limit for the number of tasks for jobs run on Mesos ? This
would be a very simple yet effective way to prevent a job dominating the
cluster
I've just been through this exact case with shaded guava in our Mesos setup
and that is how it behaves there (with Spark 1.3.1).
cheers,
Tom
On Fri, May 15, 2015 at 12:04 PM, Marcelo Vanzin
wrote:
> On Fri, May 15, 2015 at 11:56 AM, Thomas Dudziak wrote:
>
>> Actually t
Actually the extraClassPath settings put the extra jars at the end of the
classpath so they won't help. Only the deprecated SPARK_CLASSPATH puts them
at the front.
cheers,
Tom
On Fri, May 15, 2015 at 11:54 AM, Marcelo Vanzin
wrote:
> Ah, I see. yeah, it sucks that Spark has to expose Optional (
This is still a problem in 1.3. Optional is both used in several shaded
classes within Guava (e.g. the Immutable* classes) and itself uses shaded
classes (e.g. AbstractIterator). This causes problems in application code.
The only reliable way we've found around this is to shade Guava ourselves
for
17 matches
Mail list logo